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ABSTBACT 

Beceat eapirical evid«ice cracerning sex and racial 
bias in testing is discassed in terns of three primry sources of 
bias: (1) content of the test itself, (2) ataospha;e in which the 
test is adaittistered, and (3) the nse to vhich the test resalts are 
pat. Test content that is deaonsti^bly sore diffiealt for one group 
than another shonld be <1) eliainated in any setting in vhich egaal 
difficulty is assaaed or (2) perhaps sore isportant* the biased 
content should be examined closely for possible causes of the 
difference, leading to aodif ieation of educational practices for the 
loff scoring groups. Special care should be taken routinely to see 
that ainority groups are aade to feel coafortable and are not 
intiaidated by their surroundings. Pertaining to fairness in test 
use, sethodological developnents underaining the traditional 
statistical aodel of fairness previously accepted without guestion 
are described in soae detail. The "new aeasnres** approach to test 
bias is seen as essentially an abandonaent of, or a reduced eaphasis 
on, the traditional aeasures of status of aptitude and achieveaent. 
(Aathor/BC) 
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A Httk over tour years ago* thte author reviewed the 
literature on test bias under the thte Testing Pmttices. 
MifHfHiy Cnmps. and Hight^r EducatioN (1970). The 
present paper attempts to u{Hlate the tc^^ discui:sed« 
reviewing the research prepress made, and in <^ain 
areas re\'ssing the outlook presented there* Since many 
the same problems are common to empk^ment testing, 
both that and educational testing are included within the 
foliowing d iscusstons. 

To summari/e the 1970 paper very briefly, it grouped 
the possible sources of test bias — or unfairness — into 
three categories and discus^ the research Ihulings on 
each. 

First, and most commonly perceive as the source of 
bias, is the content of the test itself. Very reasonably, if a 
test is biased, it must be b«:au$e of what is in that test* 
Within this category are the questions of predictive 
validity for minority groups. But a second cate^ny is 
that of the utmmpheri' in which the test is administered, 
the environment being an important determinant of hoi»' 
the student or applicant actually f^rforms on the test 
regardless of content, even including whether or not the 
person conies forward to take the examination at alL 

The third category of unfairness in toting has to do 
with the use to which the test results are put. In the !970 
paper I stated: 

Test use. tumever. is seMom regarded as a subject for at 
least the cn'dinary kinds of research effort. Unfairness from 
any source, ht^wever. can the weak link in an otherwise 
strong chain and misguided use of test results can a 
very serious defect in a testing program (p. 7). 

It will be seen later in this paj^r that one shift of 
emphasis c^curring is the increasing realization that test 
usi* may be the one most Important source of unfaim^, 
deserving a great deal more attention than the others, 
certainly more than it has recehred in the past. 



Hie issste of Diff erentfari VflB^ 

In 1970. the conclusion was that perhaps some of the 
criticism of test content was inappropriate, based as it 
was on the belief that the tests do not predia as ac- 



caratvfy few minorities as for the majority group. Tlie evi- 
dence available at that time was considerabte, tntt tt^ 
strength of the conviction that differential predictive 
validity existed was even mcH^ conskterat^. 

In the intervening years, the empincal evtdem:e has 
continued to accumulate and continue to point in the 
same direction. Davis and Temp (1971) collected data on 
\hs relationship t^tween fnediman grades ami SAT 
scores for groups of black and white students at the same 
Ci^ltei^. Tht^ found wide variatbn in validities across^ 
the institutions, and a tendency for validities to higfter 
for white groups than black gnmi^, although 
difference was not large. They found, similar to findings 
reported in the earlier paper (Flaugher, 1970, p, 13), that 
there was a tendency for black students to t>e predicted 
to do better than they actually do when a common pre- 
dictkm equation is ui^ for both black and white 
students. The authors r^mphasize the need to keep veri- 
fying the valkiities on a local level fm- both groups. 
Pfeifer and Sedlacek (1971) and Kallinga! (1971) also 
fcHind overprediction for black students in two other set- 
tings, but stressed the need for separate prediction 
equations, 

Schmidt et al. (1973) revtewed the roiults of a number 
i f different studies in imlustrial settings and concluded 

tha: 

. . . Jof tioth sub^tive ami (^j^ive criterkm measure 
observed fluencies of both kinds of single-gitnip validity 
(significant for whites but not for blacks and vice versa) 
wen? not signi^antly different from those pr^icted by the 
null dilferenct!s mc^el. These findings cast seHous doubt 
on the exlstcni:e of single gnsup ami differentia! validity as 
substantive phenomena (p. 5). 

Additional results ai% available from an extensive 
:itudy by Campbell and associates (1973), which con- 
eludes that 

... it appears that differential validity, if not enthely a 
statistical artifact where it does api^r, is at best an 
isolated phenomenon (p. 425). 

Meanwhile certain legal activity may prolonging 
the vhality of the conviction that differential validity is an 
, issue. The fi^leral government is attempting to produce 
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tufw guidclinvs tor empioyntent practice In American 
indusitr>'. and one reason for the pnYlonji^ delay in pro- 
ducing them ha?i been the issue of ditferential validity, 
the existence of which '\% stilt considered to be a matter of 
debate (Singer. 1974). Substantive rejiearch findings are 
once again being challenged by what *'ever)one kno^i^** 
regardless of the data. 

Regardless of the evidence to indicate that the tests 
can do as giH>d a job of prediction for minorities as fw 
the majority gn^up. the specific ci>ntent of the tests has 
come under ct>ntinuing scrutiny. Angotf and Ford (1971) 
conducted item analyses for bi>th black and white high 
schmM students, and found that the item-by-group inter* 
actions could be reduced considerably by relativdv crude 
matching techniques, suggesting that the interactions 
might disappear altogether with mote careful matching 
im pertbrmance level. Further, they found significant 
item-by i-m- interactions for black students, casting 
doubt on the existence of a single biKly of content that, if 
included, would be uniformly advantageous to black 
students. 

Certainly particular test content that is demonstrably 
more difticult for one group than another shouki either 
be I ) eliminated in any setting in which equal difficulty is 
assumed or 2). perhaps more important, the biased 
content should be examine closely for pc^ibie causes of 
the difference, leading to suggestions about modifying 
educational practices for the low-scoring groups. Thus, 
in an extensive study by Breland et ai. (1974). the per- 
fi>miances of 10 sociocutturai groups over six cognitive 
tests were analyzed for instability of difficulty level acrtK^ 
gn>ups. Quoting Breland et al.: 

Fhtr greatest instabiltttes wei^ noted among the 
V4H:abular> item*. These vocabulary instabilities appeaml 
to be attributable to linguistic differences. primaHly those 
existing between Spanish-s{^aking groups ami other 
j^mup%. It was also observed that reading test items having 
material relevant to htack culture «ere relath^eiy easier for 
blacks than were other items in the test battery. A perhaps 
Jii^nifkant finding occurred in ihe analysb of mathematics 
\tems. Mathematicai kncnvledge obtainable from everyday 
life %ituaiti>ns. fiuch as hi; > to ecnint money, were relatkdy 
less dtffirult for minority groups (than other classy of 
itenuj. In contrast, very simple mathematical pn^Iems, 
such as determining the value of square roots of whole 
numbers less than ten. seem extraordinarily difflcuh for 
minority groups. Since such knowledge, though easily 
ohfained. Is usually only obtained in a school setting, wha! 
is suggested is that nunit minority groups In the United 
States receive seriously deficient schooling in mathematics 
ip. ill). 

In another, more unique approach to the question of 
biased test content. Green (1972) suggests separate 
application of the same test construction techniques for 
each ethnic group, using a single comnwn pool from 
w hich items are to be selected for Inclusion in a test- To 



the extent that the selected items overlap for the various 
groups, the test may be saki to be unbiased. Gieen^s 
empirical findings would indicate that inde^ the overlap 
of selected items is trequently less than perfect, and his 
tcx:hnique can be a source of infcmnation about the 
Impact of test content Interacting with the ethnic group 
identity of the test taker. 

in general, however, as with cAhot related studies of 
test content, the nature and sii^ of the changes that 
WiHiki be made in test content on the basis of these data 
are such that only small differences would be realized in 
the test scores received by individual test takers* Thk 
cimcluslim is eventually drawn by most lina» of investiga* 
tbn in pursuit of the top'c. The finding are important 
for greater undemanding of both testing and the nature 
of educational opportunities for minorities, but meie 
elimination of some smalt subset of test Items does not 
appear to the answ^ to the much more sizeable 
probkm being referred to as test bias. 

The Atmosphere of Testfn^ 

In the second general category of unfairness dismissed in 
the 1970 paf^r, namely atmosphere, veiy little sub- 
stantive reseai^h has been forthcoming that would 
change the conclusions made in that paper. Additional 
studies of such peripheral characteristics of the test as 
time limits have been conducted, such as v it by Evans 
and Reilly (1972) on the Law School Admission Test- 
Their conclusions are typical^ 

I) The test is somewhat more st^ded for f pre- 
dominantly black groui^l than for regular candUat^, 2) 
reducing the amount of sp^edness produces higher 
scores for both (groups], ami 3) redudng speededn^ is 
ftai significantly more bene&?ial to (the predominantly 
black gnnspsKp. 123). 

Another aspect of the atmosphere of testing is the 
amimnt of sophistication or experience needed to over* 
come the idiosyncratic characteristics of the testing 
situation, including such things as the type of test Item 
and the answer sheet format. To give the most favorable 
representation of their abilities, students must overcome 
the mttiium and concentrate on the mi'ssage. the test 
content Itself. Thus, some {^ople are characterized as 
being able to take tests and others as not. independent of 
their competence with the subject matter. The middle* 
class student is seen by minorities as being able to take 
tests because of greater experience with the '•tricks** 
nrquired to perform well. 

There is some evidence that test wlseness amounts to 
an ability to take advantage of the vioiations of gpod 
item-construction principles (McMorris et al., 1972), 
suggesting that more carefully construct^ measures are 
less susceptible to these effects, although positive results 
have been achieved with eiementary*age students on a 



taand^rdi/cd reading tc^t (Calknl^ch, 1473). A ver\' 
carefui ami Inttmsmr study itn high ndnn^ students by 
Evans and Pike (1973) did achieve p4)»ifive results for 
three types of mathcntattcs items, but the 21 hours of 
instruction over the seven weeks of the experiment were 
of such a nature and intensity that they might 1^ 
t^ard^ as a legitimate mathematics currieuium rather 
than a content *free coaching sessbn. 

hippst 1974) has continued the wwk rejxmed at length 
in an earlier paper by Kat/ U970) and. in i^neraK ctw- 
tinucii \o Hnd that atmosphere variables, such as the race 
of the examiner and the percet>cd use to which the tt^t 
results were to be put. usually huw a detectable effect im 
lest {Hrrforntance and motivatttm of minorities and. 
}X'rhaps. the ma^n-ity gn^up members as well. So many 
atnunphere variables exist and remain unstudied, 
mm ever, that little pniga^s has bwn made in untangling 
the nuiltipte interacttiins of such things as {^rrstmalities. 
achievement nsotivatiim. pereetvt^ ciimpaHsiHi groups. 
}H*rceived ItkelthiHHl of suci*ess, and ix'iwived status of 
the examiner. 

In spile 4if incompiete research evidence, the general 
concUisittn on the questtim of atnutsphere cHects is that 
special caa* shmild he taken nnitinely to see that 
niin4>rity groups are made to feel comfortable and are tm 
intimiilated by the summndings. Such things as unusual 
distances iif travel to the testing center, aHTuttment 
publications that disciuirage mim>rities« and insensitive 
treatment at the center are being seen as potentially 
impimani inilucnces on the test perfi^rmance of minori* 
ties and. theretore, to be attemled to regardless of the 
tack of extensive firm evidence that they make a defer- 
ence. Similarly, ahhough ni> evidence existed at the time 
to imiicate that it makes for better pertbrmance. test 
makers began including such things as reading passa^ 
by and about black people in the hope that some bene- 
ficial efl'cct would be reali/cd, Brcland et ah <1974) later 
loumi S4>me justificatiim for this practice, as described 
earlier, though thecHVetivedilference i>n total test scores 
seems to be uh^ small to be detected except in very large 
samples. 



Sex Bias in Teutk^ 

Wimien arc not the usual sort of minority group and do 
not have the usual S4«l of ditticulties with testing. It is 
iK'quentty the ease, for example, that instead of earning 
kmer test scores, the women in a particular sample may 
score better than their male counterparts, but frequently 
they are nevertheless restricted to filling a certain per- 
centage («f the available oj^nings or are eliminated or 
discouraged on s^mie other groutids. Sometimes when 
the openings for women are very desirable and scarce, 
this leads to targe discrepancies between the average 
aptitude test sabres for men and women, provkiing hard 



cvitlcncc ol the exiMcnce iif discHminatKe practices, 
riwsc practiivs are K'itig chaltengiHl ami gradually 
K'ing abamloncil. aUmg with the cusiiHiiarv preference 
sIhw n to men in pronuHtims to higher levels in business 
and imhtstrv'. 

Hie tests, however, if not primar> instruments i4 
diMrrimination against winnen. are a s4Hirce i^' gnrat 
irritatiiin and iHThaps even real unfairness, beeaut^* i^' 
tile 9^'xism in the ima^ of women that K {mijected by the 
language. AithtHigh tl^ nature of the languai^ is tm the 
resf^msibtlity of the test makers, publishers of all kinds, 
incluiling thtise of tests, are under increasing pressure to 
eliminate the practice of the preferred masculine tm>- 
UiHin and the dominance of masculine referents from 
their produi*ts ("umiikind.** •*tirew«w/' "the average 
American drinks Am coHW black"). Controversy con- 
tinues over whether or not satisfactti>n of this demand 
difes such viotenee to the usual linguistk: habits that it 
distracts attentiim from the task of the test itself. 
McGraw-Hill (undated) has pnwkli^ guidelines for hs 
publicatiims that sohi? much if not all of the anticipated 
awkwardness. To the extent that such changes can be 
made, testing materials as well as teaching materials, 
reference works, and nimfiction works in general can be 
legitimately asked to eliminate the role language has 
played in reinforeing the existing inequality between the 
sexes. 



Test Uses Fairness in SUtti^icaf Sdection Models 

It is on the topic of test use that the most dt^matic 
develi>pmems have taken place since this writer's earlier 
review. Additional research efforts were called for at that 
time, neces^rily not of the usual sort, to study the fair- 
ness of the utilizatk^ of test information. It was not antic^ 
ipat^ at that time that methodok^tcal developments 
were to be next'-^-on^ that would unctermine the tradi- 
tional statistical model of fairness prevmusly accepted 
virtually withimt question. In the next few pages, these 
developments are described hi some detail, inasmuch as 
they are quite possibly of nH>re ultimate import than are 
the previously descril^ studies of ditferential validity 
and unfairness in testing atmosphere. 

Earlier in this paper, the evklence few* underpr^ictkHi 
i>f mim>rity grc^up members was found to be virtually 
ab^nt in that the existing tests were found to pr^ict 
ablaut as accurately for minorities as they do for the 
majority gnnip. This appi^ach to the determination of 
fairness is based on a statfetical model that has come to 
be kncwn as the traditional, or Cleary. model, after the 
researciK?r first employing it In a study of test bias 0968), 
In 1971. however, both Thomdike and Darlington 
separately showed that the tn^ditional definition has 
ditficuhtes. Thornd ike showed that regardless of the 
equivalence of the relationship between test and p^- 



dkicil criicrum tor the tun giMupi^. such a f^tatiMicai 
fiHtdci is itHliiir iu ilw /rnrrr sv^mNg gr^nsp f^'uf^uwiisi, 
##f ihv M*HM' thiU thv pnffff^rfmi ufihui gttmp ifuti qualh 
(ivx m ffiv icM W ill lum um io ht* smaltvr ihan ihv pnp- 
intrtiim %ylm wtmld Ih* ifualijUxl m ihv ^Uf. 

< VHiiinly that detinttum. Maicd as it i^^ in tcm%s nf jmi- 
{tiMifiMis selected versus proft^inkHis who uiHikl succ^'d* 
seems to Ik* reasofiahle and de!>iirabte. But still amnher 
detlttiitott was stnui ativticated by Cctle il^lM who pa*- 
lerail to look at the situatt<)n in this manner: Cm*f§ am' 
fmrnlur of ihv mujikriiy Mnn4p an J i»iv 9m*mfH'r of thv 
miNurity fimup. Mi/A ofwlum wnililsuccviHiif svh'ttvd, 
the pnHVilun* is uNfair unless tiny haw ihv sumv pmhu' 
hilify u fheifi^ sclvctvil. 

rhat dettnitum H'enis as appealing as either TlitnTi- 
dike s 4»r Cleai-y s. bu? three are in contitct with each 
iHher ami caniuM he advocated or practiced simul- 
tatieously. Darlington^ {h)7l) ciwvo{^uali^^fian of the 
imiblvni is seen \o the mmt accurate description ot' all 
the niinicls siniuitanctmsiy. because he demonstrated 
that all three dehnitions ctnild be eiwtYfnpassed within a 
nuHlc! that retained a single, but vanabie-weight. 
correction factor. The particular si/e 4>f that factor was to 
be determined subjectively, that is, on. the basis of other 
fact4>rs that He completely outside the statistical nuvdel 
itself. This variable factor was to be added to the cri- 
terion scoa^s (ji>b performance, measures of college 
success) i\i the li>wer scoring group, and the of the 
faci4>r was tt^ be determined by the particular set of 
chiHicn values the selector in- the se{ectin^4 institution 
w ishetl \v invoke. No hmger was it to be pifssible to claim 
that the objective statistics used in selection were the 
ciHirt 4if last appeal, the ultimate determinatbn of just 
what v%as fair; rather, the objective prcKcdures were seen 
\o be a strictly mechanical impicmentatkm of the defini- 
tion iff fairness that had been thmen by the select in*.. 



Mm We Have a *^iiota S^iitem'' To Be Unbiased? 

tip to this pinm. largely for the purpc^es of ease of 
presentation, this values and selection fairness discussbn 
has priHTceded as if there were a clear path from our 
previims naive practices to the adoption of the en- 
lightened corrected criterion model, leading to greater 
equity and mutual understanding. This is not the case, 
rhc fiHTUs 4>f the diHicuity is ujH>n the C4>rrecticm factor of 
the new f4>rmulation. When 4>ne gnnipS scores, either on 
tluf crileri4»n 4>r on the selection test, are treats 
diticrently than tht^se of the i>thcr gr4>up. this amounts to 
establishing a douhlv siandard. It ^ pr^seiy the same 
practice in C4)ncept as establishing two diti'erent cutting 
sc4>res 4m a select i^^n test and the same« in effect, as 
deciding upim a itunta iot 4>ne 4ir the other of the sub- 
gntups. Particularly when stated in terms of a quota 
system, the result is 4>ften a strong negative reaction on 



tiK* part of many 4»b>*crvers. In such cases, the one^vaHie 
system, which endorses thb sort of fairness h) seiectk>n. 
takes second place t4i on^j that etdtH^ses strktly 
equivatent sek^ctiim standards regardtess of ethnto 
iilentfty. All of the newly deveh^ped mtxtels 4tf selection 
iairiK'ss have this perceived Haw. In that th^ requite 
S4H1K* correction of one gnntp^s s«.hhvs to the disadvan- 
tage 4)f the others. Only the trmiitional nuniei applies no 
ciHTecti^m fact4>r« 

11iis dilemma is not confined to uninfiMmed fmbiic 
4^nii)n: it extends into cnir courts atd into our declara* 
ti4His of public poU4?y as well. The United States Supreme 
f4mrt is currently on as endi^ing 4inly the tradi- 
ti4>nal nutdel. implicit in their declaration that "race, 
nrligiim. natbnality, aiu] sex Income irrelevant** KGr^ 
I .\. Dnkv Power dmipany. 401 U.S. 424). thereby ruling 
4Hit differential sciire ciHrection based m race. On the 
4MlHrr hand, a m4»re re«^t dedsbn in the Suprenw Cmirt 
4if the State of Washington has ruled that, at least in 
cducatiimal sehxtion practice, racial distinctions cuf$ 
made for compmsaffN^' purfms^ UJeFunis vs. 
Odi^rd. see Gin^r, 19741 Until the maner is finally 
n \ on by the United States Su{»-eme Court, however, 
thi t IS likely to bv seen as tegatiy unsettle. 

Tiv- ;'4)niiicting attitude toward compensation causes 
a grL. deal 4)f disagreement abmit which partmilar 
nuKK I is appropriate. If It is ackncwiedged that an in- 
jusi^-f* has 4H:curm] in the past to a particular subgroup 
of the populati4)n such as an ethnic minority, it may or 
may mM r4)lkiw that some sort of compenwtkm cm the 
part 4if s4H.'iety is appropriate. If compensatkm* does 
fothm. a public piilicy like that expressed by the late 
Lynd4m Johnson would be applicable: 

To be black in a white society b not to stand on level ami 
equal ground. White the races may stand side by skie, 
whites stand on histcH^^s mountain and blacks stand Hi 
history's hcrfksw. Until we overcome um^al hktory. we 
cannot overcome unequal c^pcMtunity. 

N4>t a white American in all this laml woukl fail to be 
(Httraged If an opposing team tried to insert a twdfih man 
in the football lineup to stc^ a Mack fullback on the foot- 
ball field. Yet olf the fieki away from tte stadium, outskie 
tl^ reach of the teteviskm cameras and fuHt\ watching ey^ 
of millions of their fellow men. cv^ black American in 
this land, man or ^oman, plays out life running a^hmt 
the twelfth man of a history that they did not make and a 
fate they dkl not choose {New York Tinws. December 26, 
1972)- 

On the 4ither hand, if the picture taken is that what- 
ever injustii^ may have occurred in the past is no reason 
liir an injustice of another sort in the pa»!»nt. then the 
p4>licy should be one like that 4ir ftmner President Nixon; 

In emplc^ent and in politics, we are conihmted with 
the rise of the fixed quota system~as artifleia* and unfair 
a yardstick as has ever b^ us^ to deny of^t^unhy to 
anyone ( Time. October 9. 1 972). 



The way m end dhcrimiMtum a^imi wm^ is mn to 
b^in Ui^cHmindtkm agaimt 4Hhm (Miami B^ach, August 
IX H72: iVi*H«trw*. Septembef 18. 1972). 

l'iiraHv« thv qut^tiiHi of tent bim has ted Ufi into atem 
far nmn? W4mi|i4cx than are capable uf reM^utiim with 
(Htiifiaf)' Sims of testing rcH^anrh. 

Test IJse: Irretei^ Sdectfon Stam^^ 

There h ar. nher sense tn which test ui^ is a matter ctf 
ctHtcem, apart fnnn tNf debate over the m^nhtg of the 
varkHis Mati%iiiai nuideh. If peo{^e at^ Ning hired fm a 
ji* that requires %»ery Hnie use of vocabulary* then it is 
mtt appropriate to select th«» applicants to be hired 
wtlely iHi the ba i> of a test of vocabulary. If th^ b no 
reason to suppose that the possession of a high school 
diploma is nccei^ry to ttte successful accompibhment erf* 
a particular job* then non*high-school graduates should 
niH be excluded inm conskleration fw that ji*. 

Iltese are seemingly quite acceptable statements, but 
in fact their violation h evidently widespr^. ;Uld to this 
the facts that a greater {^ventage erf* minority gnnips 
sciw p< wly on Viicahuiary tests and more erf" ttem faH to 
graduate frt^m high school, and the result is a shuatbn 
that« intended or ncrt, constitutes an effective means of 
discriminating against minority grmt{», awarding tl^ 
fewer iipportuntties to prove themselves on the job. and 
perpetuating a lo%'er standard of living. 

A vocabulary test may be a perfectly legithnate, 
carefully constructed and administered te!^« but (me 
intended for. say, prediction of succm In colle^ EngHsh 
courses rather than toward deciding who would make a 
gtwd fire fighter. Being used as described, however, there 
ts no question that the test use is Nas^ and biased in 
particular against minority groups. If that same test wei^ 
to be employed in a coll^. however, then the bias in 
that test would be absent. The bias, in other woids, can 
exist ti>tally in the misuse of the test rather than in some 
internal or peripheral characteristics of the test iti^If. 

Boehm l(K>ked at 13 recent stud^ dealing with 
Negro-white differences in emf^oyment and training 
selection procedures. She fcmnd that 100 of the IW 
validity coefticients were not significant for i*ither gituip* 
••The use of nonvaiidated methods for selection," she 
stated. *'i5 apparently not uncommon'* CI972, p. 37K and 
sh^ pointed out that this frequently excludes a dlspro- 
{H»rtionate number of Negroes for reasons unrdat^ to 
jt* performance. 

The CaM for Mew Meastires 

By way of summarising to this point, there Is ample evl^ 
dence that if we mean by bias that the tests do not pre- 
dict as accurately for one group as the other, then the 
tests are not biased when they are used in the appro- 
priate settings. In addition, we have seen that the search 



lior biased cimtent in tests themselves has been rather 
unpnnlucti^'e and frequently teads instead to questi^His 
ab^Hit the quality irf" previiHts educatttHi. In spite of tl^!^ 
dc%\'lopnients, the accusations ccmtinue, and the pillar 
tmpressiiin it?mains that the tests are biased against 
minurittes. In addititm. it is simtetimesclaimi^ that what 
is iHreded to o\*enMmie this bias are **new measures/* The 
term **mmtraditiimat means €rf* assessmoit" is alstt 
heard, and that "means of f^^^v m^suring the antount 
of kmm ledge ruMamed by \ • n^rdless <rf* hb or 

her indis-iduai background** . vk^>ed. 

t his general line irf* criticin. in the con* 

victfim tty many that minmlty gnmps pih>. ss talents and 
attributes that are unh|ue to their groups and are 
valuable and important, and these attri^tes ai« 
cimipleteh' absent from thtise tests devetoped by and for 
tlH^dimiinant white majority {Blake 1971; ft-ai^^etel 1972: 
( amenm 1970K This, then, is another sense of the wofd 
hiitx — that the tests may be accurate and apprtipriate for 
sonK* segments of tte pqnilation, but tlu?y are aimed 
at, and, thi^tbre, cannot ctocument, tin? unique attri- 
butes of people from minority cuftures. 

The pnmat^' difficulty with this a{^>roach, of cmirse, b 
ttw fact that if the test smtc is to N useful, it must be 
a*iated to sinne alternatives or actions in the scnriety, such 
as {mwiding a prediction of success w failure at a j<rf> m> 
In additional educationai pursuhs. If certain talems 
remain undocument^ within minority groups, a 
}Hmib!e reasiin is that they are yet to be considen^ iHiffi- 
ciently imptmant to society to document. In the current 
changing atmmphere, the priwity for atteiullng to 
undocument^ talents may be upfi^ed; houwer, for 
the most part, researdi on this aspect has been slow and 
cimfined largely within the existing parameters of human 
aptitude tf^hnoiogy, seeking to find new patterns of 
known aptitudes rather than striking out in search of 
entirely net*- ones. 

In the earlier review, fkjr exampte, this author dis- 
cussed the findings of Lesser. Fifer, and Oark (1965) and 
Stodolsky and Lesser (l%7K which dcmimented the 
existence of difierential patterns of ability In several 
minority groups. These same patterns were at high levels 
or low levels depending on the socioe^cmitc status of 
the chiWren within the ethnic gitHip. but they demon- 
strati their viability by remaining constant within those 
groups. These studies were of first graders in New York 
and Bi»ton schools. Fteugher (1971) analyzed some 
available data from four similar ethnic gitnips on four 
tests taken by inner^Hty eleventh graders in L&s Angeles, 
atui the resuhing patterns strengthen the belief that such 
patterns do characteri^ the grouf^^ Thlv can ammint to 
stereotyping the various ethnfc groups, however, and has 
ghren rise to fears of educational resegi^tion on the 
basis irf" strengths rather than ethnic idemity, with the 
results being the same secotui-clai»i ti^tment for 
minorities. Thus, the fears of how the test data might be 



iiwd to discrimitiate bctwwn the mlmiriilw and Ihtf 
majiirtty grimp cttmcs in direct auiUkt t»ith the nmi for 
s|Hvial tivatnivnt of the utittjutftHrss iH* the mittuHties. 
U^M^r !i.r »MH?. has bevontf frustrated by thei» 
det\*k>pnicntti: 

In th« light itf tt^Ktf tth^ktm. fawer and fewer ymiHt 
Mtdai sdeitthts are undertaking rewardt on catttire and 
eoftnitiofl. ttecau&e the penaiti«» to thk research aie ail too 
i^itw*. We are tadng a seff^fedared imwatcH^m on 
<«aivh . . . abtMit the eom^ims that esht Nti^efi 
cultural condltitMH aiRi cognlti>e gnmih (1972. p. 24). 

One approach, in its tanning slate, is advocated by 
Gordon (1974) and it apfKiars to be an implant 
respt>itse t« the {(cneral cat! for new measures. GcH^mi is 
pr(»po%in{i a fundanwntal shitting of the focus trf teeing. 
Current tests, he states, are ait directed toward asH!«iing 
the siimus of a person at a particular time, «-hether the 
test's aim is describing aptitude or achievonent. KtumU 
ttlge of a {vrson's status is <iometimes u^fttl, of courK. 
but an important aspect of that person r^nains un- 
de^rtbed. that of the pnnH'ssfs the pen»Ni has u%d to 
arrive at that status. Far more important to an educator 
interested in assisting a student, for example, is the 
nature of the pntcesses that are being U!»d for teaming 
rather than the status of an individual at any particular 
time. When the question of minority education is ^* 
dressed, tin? answers — the solutions— are in terms ctf 
apprttpriate diagnosis and prescriptions rather than pre- 
diction, and this is best done in terms of prwess varia* 
blcs rather than descriptions of status. 

lite nctt measures being proposed, then, are not 
simply extensions of the existing categories of aptitude 
ami achievement; they arc an entirely new class of 
measures directed timaitl aspects of human behavior 
largely neglected until now. The little previous work that 
exists includes that of Herbert Birch and his a^cwiates 
(ThtMiias. Chess, and Birch. atul the eatiicr at* 
tempts b> KIsc Haeussermann (1958). The Thomas. 
C hess, and Birch studies have described nine categories 
t»f temperamenial traits that C^ordon <1974) suggests ai« 
a gt«Hl starting pi>int for these studies: I ) activity level. 2) 
rhythmicity. .?) approach withdrawal. 4) adaptability. 5) 
intensity 4»f reaction. 6) threshold of n.>s{Kmsiveness. 7) 
quality 4»f nuH»d. 8) distractibiiity, 9) attention span and 
persistence. Further. Haeussermann devdoped an Inter- 
view technique In which she described the style of 
learning and its developmental level, with the purpose of 
Inscribing the best pi>ssib}e methods of instruction for 
a child. Fn>m these elemental beginnings. Gordon hopes 
to increase the undei^tanding and use of these ap* 
proaches to assessment. The "new measures" apprc»ch 
to the problem of what is wrong with tests, and. thus, the 
pri»blem of test bias. is. then, essentially an abandon- 
ment of. or at least a reduced emphasis on, the tradl- 
tionai measures of status of aptitude and achievement. 
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Have We Re^> Been Td&^ abott Bias? 

fhe review up to this point has ctiveicd the definitions dt 
bias that h&sa to do with the content, atmospheic, and 
the ways in which a test can be ns^d in both fair and un- 
fair systems of seiecimg and the priispects for itontradi* 
tkmal means of as!«ssm«it. 

But there are stnne «4ttes i»f testing who wtmkf deny 
that the real tnas (Htibiem has been dii»ntss^ at alL To 
those, not only are all the statistical studies totally beside 
ths point, they may t^en anHMint to attempts to ccmftise 
the bsues m* divert attentitm from where the real pit4»* 
tents \k. To them, the existence of culturally bian^d tests 
is not in dispute but is shnpty a known and established 
feet 

In a ^me, the statement about the cuhunU bias 
true, because c«lainly the cuhuie is tv^ed in that 
culture's tests, and. In cinrumstances such as liwashig 
examinations, the test Is often the onfy tangible indica- 
tion of the standards and requirements fw that aspect of 
the culture. If the culture it!»lf is petveived as biased 
against minorities, then certainly its examinations can be 
expected to be biased as well. If this is the sense in which 
test bias is used—that b. as part of a mote general beitef 
that the entire society is biased — tl^ a case can be made 
that the problem lies not whh the test but in the nature of 
the entire culture. Many s|M}k»ni«i are likely to (teiqr 
this position as legitimate, however. b«»use, dnee tests 
serve as the gatek^pers for much of society's rewanls, 
the tests make an ai^n^irlate first tat^ in the battle to 
create a more just society. 

To some minority spt^esmen, the description of 
validation statistics showhig that minority groupi are 
predicted as well as the maprity group and tlw es^n- 
tially negative re«tlts from Mudies of biased content are 
no rcsf^mse at all to their criticbms erf testing, no matter 
what the final results. Rather, no other statistical evi> 
dcncc is needed than the underrepr^ntation <rf minori- 
ties in the meanlngAil. lucrative, and prest^us pc^- 
tions in the society. $<NHety is wen to \k treating the 
mimtritk»i pcmr\y. and the testify industry k a visible and 
active component of that piwr treatment. 

Robert L, Williams, for example, has described the 
following circumstances, which he claims characterize 
the educational fate of minority students. First, as the 
minority student initially enters schocrf. he very likely 
ctimcs from such an ImfMsverish^ environment that he is 
less preparoi for school than the typtca! white student. 
The early testing results reflect this fact, and the lower 
test scores are used to justify the tracking of such 
students into "special" clas^. which are special in that 
they amount to tfw abandonment of eifort for those 
students. Instead of the extra attention such test results 
might seem to call for. there is. In fact, less effort ex- 
pended and fcrllov.'ing that, of course, less expectation of 
success. Mercer (1973) has provided a careful docu- 



nhfniiitkHi ol ihK pnw\s in ihc C'alifiirnb iichiHtk Sure 
cfHHi||h. Hhvn the next nHitid i>t' testing k ct^uct^« t)^ 
sfKHTiiii student!^ e^tfti numf piH^-ty ihan beUwe. 
1 hK. in tuni, iu%tiHe!i further nvg^tive dedsknui about 
the UHcluInetiH of additHin^l t^uvathmdl viUm. euimin^t* 
ht^ tit a denial ttt aeee%^ to a i^^k^ education because 
the student in ^'unprepared/* 

The tests are initially ukcd to condemn the students to 
an inferior educati«m. are then used lo document that 
same fact, and then finally den> further t^pmtunity to 
tJ^ same students. And fhuL %&y% sptrkesman Williams* 
is Hhat is biased abtnit testing ti^74). No validity codH- 
clents or elabtirate item analyses of tests are really ad- 
dressine this basic questiim at alK and so no a -nt of 
such data will escr con% ince the spiO^esmen tb* s are. 
in fact, fair and that the eritWisms are unfouno^^. 

Sii testing is biased, our educatitmai system is biai^, 
iHir cmptityment practices are biased, inir emire sockty k 
biased* As part of this larger biased netwiu^k, testing 
canmM, thea*fore. afu*nipt to e%ade the blame hv claim- 
ing that it only a*tk«cts the biased nature tl^ rest 
Mtitety. But what aKnit the valklity studbs on the pre- 
dative f^mcr of the tests? l^Hfsn*t this show that the 
tests aR' doing exactly what tln-y are claiming to do? 
When knv -scoring students. mim>rity or majority, are 
admitted to a demanding cidlege curriculum, they tend 
141 fail in \ar^ numiH*rs. What is biased atKHit an ac- 
curate prediction? 

A mimHity sp<ikesman\ refrfy lo this might well be 
that tiK' tests nrpesent the first barrier, the first of many 
K-twccn miminttos and successful partkripatkm in 
Mxmy. Once this first barrier is eliminate, then the 
nest one will be dealt with. If this turns out to the Ina- 
hility t>f the colleges to take the minority students as they 
are and etfucatc them, then the nsxt target will be that 
tact, and steps will be taken to deal with ihm. Tests are 
niH the <mly barricf , but they arc frequently the first, and 
hcniv the first that sh<utki be dealt whh. 

Ibis piisitfiin on the part tests frfay in altocating 
ciiucatitmai tfpportunities has necessarily undergone 
simic rcexaminatiim in light of dccHning ennrflments in 
the accredited ct^lleges of the natkni — there are curremly 
tti4Hisands iif empty seats available ami colics are wel- 
C4>niing anyone with virtually any test scores. Many 
Ciiik'gcs are in desfHH*ate financial condition because of 
declining income fnmi tuitbn while costs are underi^ng 
intlation. which means that more students who fill thc^ 
scats must pay their ow n way rather than hope (m a sub- 
sidy fntni the colkge. Ami since minority group memt^rs 
are \cr> likely to have limited financial means, they are 
unabkr to take advantage of the c^pi^unity. So the 
etfect is the same— higher t^ucatbn is being deni^ the 
min4)ntk*s, but now the reasiin is financial rather than 
biased testing. 

There are still many highly selective colleges that have 
far ni«)re applicants than availabte spaces, however, so 



this characteriitatkm does not api^y unitimnty across 
C4^k*ges, and in such settings, test scores are laHl an 
impimant part of the i^f^km pnoi^^ Thus, questions 
of test bkis might still arise, but in such settings^ pre- 
dkiive validities constitute a mwe relevimf fi^p(M»^ But 
to det^rribe the primary bartier to mhtorittes as oi^ irf* 
testing, while igmniAg the very coimkterabie financial 
barrietii that, in fact, are on the increase, seems 
unrealistk and mtetei^ing. 

WteM idimit a Mmim^taB tm Teitt^? 

&ime imShrkluaki ami groups feel so Wrongly about the 
damaging etteets df testing that they have aiivocated a 
morat4uium on testing. At the 1974 meethig df the 
NAACP, for examf^. a re»^lutkul was ^|Med that 
demanded **a moratorium on stamiardized testing 
wherever such tests have not been erected for cultural 
bias* , . /• 

Adviicates of testing, however, worry about the con^- 
quences cH* toting, citing the i^essary return to !»ib- 
^nive impret^kins gain^ fhHn intervtewrs, a procedui^ 
mit likely to fanvr minorities, atnl the lack erf a yardstkrk 
by whkrh the educatkmal establbbmem can be heM 
accountable for the jc^ It b doing (Me«&k:k and An- 
dersim, 19701 In any case, it remains unctear^st what 
impact these t^scdutkms may have on testing. One nuN'a- 
t(mum-like step has b^ imn^Mied by the test puMishetv 
themseh^ (Smith. 1974) invohrmg the ui» of the 
Natkmal Teachers Examinatlcms «rores to determine 
salary kfvels f4»r practk^ing teachers in S<n$th Carolina. 
Since the eseaminatk>ns are intend^ as measures of aca- 
demk- achkvement Hither than teaching perfOTmanee 
uself. this constitutes a misuse, and tl^ publishers have 
refused to report scores to the State Department <^ 
Educatkm until the practice stops. 

A reply to the advocates* fears about non testing is that 
i€ Gunning (1971) who states his qsinions on the ran- 
sequences: 

If standardiml tests were not used, t do mH fed that 
there wmM be an increase in di^rimination. iHit a 
diHrrea^f. No kmg^ wouM a pn^^eetive employer have the 
escu!^ that Blacks are anquattted but wmik) possibly hire 
the Black persons ai^ let tt^ actual f^rformai^ on the 
job be the test, I further do m feel that file penKHiai hrter- 
view metimJ of a|^kal k tte evil that it ts thou^ to be. 
In a persmial intervtew, sudi tbin^ as motivatkm. en- 
thusiasm and desire for the job deteiminatkm to 
succe^ can be detects. These are gteat determining 
f^*tors as to whether or nm one will be wcee^t hi hk 
woric tp. 7bl 

Thus, tl^ return to sub^tive staiulards is evidently 
seen as an advanta^ to minoriti^ It is an iiDnicai 
historical development that cHit^ive testing, once ^en 
as the factor that permitt^ meritocratic principles to 
predominate over the previous aristocratic means of 
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scLtrthHi and «a» thu^ ibt* hettefacttM* iif mlnoHty gmtj^ 
k mm "teen a« a barrio that must b« (n«n:um« by ttem 
{Crm%. H7I. p. 2k 

CoiM^tiKHng Remarks 

At the btfginnititi iil thi« paper, it was Mated that a sWft 
f4' emphafth has ticctiiTed since the first attem(N to deaJ 
^^ith the qu<»itNtn of te^ bla«. By itow {^a|ft H is clear 
that there has bem an increasing real^tkm on the |»rt 
t4 thiwe concerned with teeing that a test cmimti be 
biased in the abstract: it must be biased cm* unbiai^ in a 
particular use. Regarding an IQ test as a ftx^. culturally 
fair measure t»f a t^rsmi's "raw" ability ratlter than as a 
rettectitHi *»f culture-bound achieventent over time h an 
exampte of bias. The test itself, however, cannm be either 
biased cm- unbiased. Requiring a high school diploma for 
a ji^ may be biased in its effect, on the other hand, even 
when the intent ttf the s^tector is onfy to i^lect th'^ who 
will do the best job. Certainly tei^ content and testing 
atmosphere should be constantly expicred for indica- 
titms that any subgroups are bdng handicap}^ un- 



fairiy. but by far the most dan^ias~~and the most 
dtHicutt--!ii>urv« of bias is what people dtt with the infbr> 
matitin. intentiimaity not. Test scores canmH, at 
present, nteasure many the most highly valued aspects 
of human heha^^M* and are tM likely ever to do so. MiMi- 
vatiim and creatKity. h«»wever defined, atv simply too 
elusive fin- standardized measurement. Further, test 
scttfvs are all subject to chan^. son^tim^i by draroatk? 
anumnts. and thcreftn^ sh<^M f»ver Iw re^uded as 
immutabte. To do poi^ly on a test is not to be con- 
demned forever to society's reject pHe. The perstmal 
wttrth of an individual is not summori2«l in an IQ scmv 
(Flaugher. }9?4), even though the public «ems to want 
to cwerinterpret it this way. 

Misuse of test informatkm, them has a wUetpread and 
significant impact on the Kves of mhiorhies. even as h is 
being acknowledged that such information is ne^e^iy 
to maintain ^ucati«ml accountability by an <^jective. 
publicatiy agreed-upon standard. Test malters and inte^ 
preters alike must be hdd resfmnsibk ptofKr inter- 
pretatkm of the me^ivments, to a degree for greater 
than has been the ca% in the past. 
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